10_Format_String
The format string vulnerability exploit the fact that the printf function works in a strange way. Indeed, it print out a string according to a format. It takes any number of arguments.
To be more precise, the signature of printf is:
int printf(const char* format, ...);
where:
Despite usual function in which the name of the arg provide the access to it, here we have to rely on stdarg marcos defined in stdarg.h.
Les't look to an example:
#include <stdio.h>
#include <stdarg.h>
int myprint(int Narg, ...){
int i;
va_list ap;
va_start(ap,Narg);
for(i = 0; i < Narg, i++){
printf("%d ",va_arg(ap,int));
printf("%f\n",va_arg(ap,double));
}
va_end(ap);
}
int main(){
myprint(1, 2, 3.5);
myprint(2, 2, 3.5, 3, 4.5);
}
Initializing va_list: When printf() is invoked, all the args are pushed to the stack (see picture).
Therefore, va_list pointer can be used to access the optional args.
The initial position of such pointer can be calculates by va_start macro based on macro's second argument, which should be the name of the last parameter before the optional args.
So, in our case Narg.
How this macro works? It simply takes the address of Nargs, compute the size of Nargs and set va_list pointer to address + size.
Moving the va_list pointer: to access the optional argument pointed by va_list, we need to use va_arg() macro which takes the va_list and as second argument the type of the optional argument to be accessed.
This macro will return the value pointer by the va_list pointer and move upward to the next optional arg.
How? It moves va_list to the address + size of arg passed.
Finishing: when the program ends accessing the optional args, it call theva_end macro.
printf Access Optional Args?In our, toy-case we have that in even position we have int and in odd position we have double.
However, printf utilize the first argument to understand the type of each argument, but it's done in a different way.
Indeed, in the format argument there's some elements that starts with %. These are called format specifiers.
So, printf scans the format string and once it reaches a format specifier, it invokes va_arg so that the value of such element is returned and va_list pointer can move on.
Clearly, the expected type of each opt arg is decide by the type of format specifier and the actual value is placed in the place where the format specifier resides.
Suppose we have:
printf("ID: %d, Name: %s, Age: %d\n",id,name);
This problem cannot be caught by compilers, by the fact stated before.
Unfortunately, va_list does not understand if it reached the end of the optional argument list. So, if it is called after all optional args, it continues fetching data from the stack, even though the data are not optional args anymore.
If the mismatch is created by a programmer, the program may print out wrong information, but it does not seems to be dangerous.
However, if a format string is provided by the user, which want to exploit this mismatching, the damage can be worse than before.
This is called format string vulnerability.
#include <stdio.h>
void fmtstr(){
char input[100];
int var = 0x11223344;
pritnf("Target address: %x\n",(unsigned) &var);
printf("Data at target address: %x\n", var);
printf("Please enter a string");
fgets(input,sizeof(input)-1,stdin);
printf(input); // Here there's the vulnerable place: there could be format string in the input
printf("Data at target address: 0x%x\n", var);
}
void main() { fmtstr(); }
The program stack of the vulnerable stack is the following:
Inside printf, the starting point of the optional arguments is the position right above the format string argument.
Then, we run:
sudo chown root vul
sudo chmod 4755 vul
sudo sysctl -w kernel.randomize_va_space=0
User input as %s%s%s%s%s%s%s%s%s%s.
By the fact that we have use %s, printf treats the obtained value as an address and starts printing out data from the address. If the value is not a valid address, the program crashes.
User input: %x.%x.%x.%x.%x.%x.
Here, printf prints out the integer value pointed by va_list and advances it by 4 bytes.
How many %x? As many as the offset between starting pointer of va_list pointer and the secret var.
Goal: change the value of var.
For these purpose, we need a new format specifier: %n.
This, writes the number of character printed out so far into the memory.
Example:
printf("hello%n",&i);
Here, 5 chars have already been scanned, it store 5 to the provided memory address (&i).
In this way, we can write on program's memory: the idea is that when there's a %n, the printf expect an address.
Indeed, when printf sees a %n, it gets the value pointed by the va_list pointer as a memory address and writes into that location.
Hence, if we want to write the value to a memory location, we need to have it's address on stack.
Assuming now that the address of var is 0xBFFFF304 (we can obtain using gdb) Since the user input is stored on the stack, we can include the var`address at the beginning of the input.
Obviously, this number are binary, so we need a way to "write" such input. We can do this:
$ echo $(printf "\x04\xF3\xFF\xBD") .%x.%x.%x.%.x.%x.%n > input
Here:
$(command) stands for command substitution. Allow the output of a command to be a new command."\x04" stands for "04 is actual number, not ASCII chars"So, when we pass the input file to the program we have:
Once the var's address is on the stack we have to move va_list pointer to the location in which such address is stored.
How? We use %x to advance the va_list pointer.
How many? By try&error we can find the right number of such format specifiers.
The result is the following:
$ vul < input
Target address: BFFFF304
Data at target address: 0x11223344
Plaese enter a tring: ***.#actual file
Data at target address: 0x2c #Which is 44, the number of characters seen so far
We want to change the value of var to 0x9896a9.
We simply modify the input file in such way:
$ echo $(printf "\x04\xF3\xFF\xBD")_%.8x_%.8x_%.8x_%.8x_%.100000000x%n > input
In this way, since we are applying the precision modifier, we have that printf has written 10000000 + 41 chars, so the value 10000041 is stored in 0xBFFFF304.
We'll se now another way of performing the same task, but with value 0x66887799
It involves the use of %hn to modify the var variable two bytes at a time. Basically, we are splitting the var address into 2 parts, each of 2 bytes (since we are on x86 architecture).
Since we are in a Little-Endian architecture, the 2 least significant bytes are stored at 0xBFFFF304 and the 2 most significant ones are stored at 0xBFFFF306.
So, it the first %hn gets value x and before the next %hn t more chars are printed, the second %hn will get value x + t. In other words, the values written to the variables %n are accumulative.
Therefore, our input become:
$ echo $(printf "\x04\xF3\xFF\xBD")_%.8x_%.8x_%.8x_%.8x_%.26204x%hn.4369x%hn > input
In this way, we first put at 0xBFFFF306 the value 0x6688 first and then print out some char, so when we reach the second address 0xBFFFF304 the number of char printed out is 0x7799.
We want to modify the return address of a function in order to make it point to our injected malicious code.
Moreover, it the program is a Set-UID program, we can get root access.
In this case, instead of passing a string, we pass a badfile to the function stated before.
Here we have to:
For simplicity, suppose that by using gdb and some trick as in the buffer overflow attack, we have
We have to write 0xBFFFF358 to address 0xBFFFF38C.
How?
To achieve that we can use a Pyhton script that create such badfile.
Avoiding using untrusted user inputs for format string in function like printf, i.e. sanitize the input.
Compilers can detect potential format string vulnerabilities.
However, this is not unbeatable: in usual compiler in printf("Hello %x%x%x",5,4) is signaled as warning but printf(format,5,4) (with format as the first printf) not.
We can use -wformat=2, which inform the compiler to warn if format string is not a string literal.
This are potential problem, but the compiler still create the executable.